Sort by
Process-related user interaction logs: State of the art, reference model, and object-centric implementation

User interaction (UI) logs are high-resolution event logs that record low-level activities performed by a user during the execution of a task in an information system. Each event in such a log represents an interaction between the user and the interface, such as clicking a button, ticking a checkbox, or typing into a text field. UI logs are used in many different application contexts for purposes such as usability analysis, task mining, or robotic process automation (RPA). However, UI logs suffer from a lack of standardization. Each research study and processing tool relies on a different conceptualization and implementation of the elements and attributes of user interactions. This exacerbates or even prohibits the integration of UI logs from different sources or the combination of UI data collection tools with downstream analytics or automation solutions. In this paper, our objective is to address this issue and facilitate the exchange and analysis of UI logs in research and practice. Therefore, we first review process-related UI logs in scientific publications and industry tools to determine commonalities and differences between them. Based on our findings, we propose a universally applicable reference data model for process-related UI logs, which includes all core attributes but remains flexible regarding the scope, level of abstraction, and case notion. Finally, we provide exemplary implementations of the reference model in XES and OCED.

Open Access
Relevant
Enjoy the silence: Analysis of stochastic Petri nets with silent transitions

Capturing stochastic behaviour in business and work processes is essential to quantitatively understand how nondeterminism is resolved when taking decisions within the process. This is of special interest in process mining, where event data tracking the actual execution of the process are related to process models, and can then provide insights on frequencies and probabilities. Variants of stochastic Petri nets provide a natural formal basis to represent stochastic behaviour and support different data-driven and model-driven analysis tasks in this spectrum. However, when capturing business processes, such nets inherently need a labelling that maps between transitions and activities. In many state of the art process mining techniques, this labelling is not 1-on-1, leading to unlabelled transitions and activities represented by multiple transitions. At the same time, they have to be analysed in a finite-trace semantics, matching the fact that each process execution consists of finitely many steps. These two aspects impede the direct application of existing techniques for stochastic Petri nets, calling for a novel characterisation that incorporates labels and silent transitions in a finite-trace semantics. In this article, we provide such a characterisation starting from generalised stochastic Petri nets and obtaining the framework of labelled stochastic processes (LSPs). On top of this framework, we introduce different key analysis tasks on the traces of LSPs and their probabilities. We show that all such analysis tasks can be solved analytically, in particular reducing them to a single method that combines automata-based techniques to single out the behaviour of interest within an LSP, with techniques based on absorbing Markov chains to reason on their probabilities. Finally, we demonstrate the significance of how our approach in the context of stochastic conformance checking, illustrating practical feasibility through a proof-of-concept implementation and its application to different datasets.

Open Access
Relevant
Read-safe snapshots: An abort/wait-free serializable read method for read-only transactions on mixed OLTP/OLAP workloads

This paper proposes Read-Safe Snapshots (RSS), a concurrency control method that ensures reading the latest serializable version on multiversion concurrency control (MVCC) for read-only transactions without creating any serializability anomaly, thereby enhancing the transaction processing throughput under mixed workloads of online transactional processing (OLTP) and online analytical processing (OLAP). Ensuring serializability for data consistency between OLTP and OLAP is vital to prevent OLAP from obtaining nonserializable results. Existing serializability methods achieve this consistency by making OLTP or OLAP transactions aborts or waits, but these can lead to throughput degradation when implemented for large read sets in read-only OLAP transactions under mixed workloads of the recent real-time analysis applications. To deal with this problem, we present an RSS construction algorithm that does not affect the conventional OLTP performance and simultaneously avoids producing additional aborts and waits. Moreover, the RSS construction method can be easily applied to the read-only replica of a multinode system as well as a single-node system because no validation for serializability is required. Our experimental findings showed that RSS could prevent read-only OLAP transactions from creating anomaly cycles under a multinode environment of master-copy replication, which led to the achievement of serializability with the low overhead of about 15% compared to baseline OLTP/OLAP throughputs under snapshot isolation (SI). The OLTP throughput under our proposed method in a mixed OLTP/OLAP workload was about 45% better than SafeSnapshots, a serializable snapshot isolation (SSI) equipped with a read-only optimization method, and did not degrade the OLAP throughput.

Open Access
Relevant
The rise of nonnegative matrix factorization: Algorithms and applications

Although nonnegative matrix factorization (NMF) is widely used, some matrix factorization methods result in misleading results and waste of computing resources due to lack of timely optimization and case-by-case consideration. Therefore, an up-to-date and comprehensive review on its algorithms and applications is needed to promote improvement and applications for NMF. Here, we start with introducing background and gathering the principles and formulae of NMF algorithms. There have been dozens of new algorithms since its birth in the 1990s. Generally, several or even more algorithms are adopted in a single software package written in R, Python, C/C++, etc. Besides, the applications of NMF are analyzed. NMF is not only most widely used in modern subjects or techniques such as computer science, telecommunications, imaging science, and remote sensing but also increasingly used in traditional subjects such as physics, chemistry, biology, medicine, and psychology, being accepted by around 130 fields (disciplines) in about 20 years. Finally, the features and performance of different categories of NMF are summarized and evaluated. The summarized advantages and disadvantages and proposed suggestions for improvements are expected to enlighten the future efforts to polish the mathematical principles and procedures of NMF to realize higher accuracy and productivity in practical use.

Relevant
A graph neural network with topic relation heterogeneous multi-level cross-item information for session-based recommendation

The aim of session-based recommendation (SBR) mainly analyzes the anonymous user’s historical behavior records to predict the next possible interaction item and recommend the result to the user. However, due to the anonymity of users and the sparsity of behavior records, recommendation results are often inaccurate. The existing SBR models mainly consider the order of items within a session and rarely analyze the complex transition relationship between items, and additionally, they are inadequate at mining higher-order hidden relationship between different sessions. To address these issues, we propose a topic relation heterogeneous multi-level cross-item information graph neural network (TRHMCI-GNN) to improve the performance of recommendation. The model attempts to capture hidden relationship between items through topic classification and build a topic relation heterogeneous cross-item global graph. The graph contains inter-session cross-item information as well as hidden topic relation among sessions. In addition, a self-loop star graph is established to learn the intra-session cross-item information, and the self-connection attributes are added to fuse the information of each item itself. By using channel-hybrid attention mechanism, the item information of different levels is pooled by two channels: max-pooling and mean-pooling, which effectively fuse the item information of cross-item global graph and self-loop star graph. In this way, the model captures the global information of the target item and its individual features, and the label smoothing operation is added for recommendation. Extensive experimental results demonstrate that the recommendation performance of TRHMCI-GNN model is superior to the comparable baseline models on the three real datasets Diginetica, Yoochoose1/64 and Tmall. The code is available now.11https://github.com/usstyangfan/TRHMCI-GNN.

Relevant
An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection

Fake news, hate speech and offensive language are related evil triplets currently affecting modern societies. Text modality for the computational detection of these phenomena has been widely used. In recent times, multimodal studies in this direction are attracting a lot of interests because of the potentials offered by other modalities in contributing to the detection of these menaces. However, a major problem in multimodal content understanding is how to effectively model the complementarity of the different modalities due to their diverse characteristics and features. From a multimodal point of view, the three tasks have been studied mainly using image and text modalities. Improving the effectiveness of the diverse multimodal approaches is still an open research topic. In addition to the traditional text and image modalities, we consider image–texts which are rarely used in previous studies but which contain useful information for enhancing the effectiveness of a prediction model. In order to ease multimodal content understanding and enhance prediction, we leverage recent advances in computer vision and deep learning for these tasks. First, we unify the modalities by creating a text representation of the images and image–texts, in addition to the main text. Secondly, we propose a multi-layer deep neural network with inter-modal attention mechanism to model the complementarity among these modalities. We conduct extensive experiments involving three standard datasets covering the three tasks. Experimental results show that detection of fake news, hate speech and offensive language can benefit from this approach. Furthermore, we conduct robust ablation experiments to show the effectiveness of our approach. Our model predominantly outperforms prior works across the datasets.

Open Access
Relevant
ArZiGo: A recommendation system for scientific articles

The large number of scientific publications around the world is increasing at a rate of approximately 4%–5% per year. This fact has resulted in the need for tools that deal with relevant and high-quality publications. To address this necessity, search and reference management tools that include some recommendation algorithms have been developed. However, many of these solutions are proprietary tools and the full potential of recommender systems is rarely exploited. There are some solutions which provide recommendations for specific domains, by using ad-hoc resources. Furthermore, some other systems do not consider any personalization strategy to generate the recommendations. This paper presents ArZiGo, a web-based full prototype system for the search, management, and recommendation of scientific articles, which feeds on the Semantic Scholar Open Research Corpus, a corpus that is growing continually with more than 190M papers from all fields of science so far. ArZiGo combines different recommendation approaches within a hybrid system, in a configurable way, to recommend those papers that best suit the preferences of the users. A group of 30 human experts has participated in the evaluation of 500 recommendations in 10 research areas, 7 of which belong to the area of Computer Science and 3 to the area of Medicine, obtaining quite satisfactory results. Besides the appropriateness of the articles recommended, the execution time of the implemented algorithms has also been analyzed.

Relevant